Geospatial analysis of type 2 diabetes mellitus and hypertension in South Sulawesi, Indonesia

The spatial variation of type 2 diabetes mellitus (T2DM) and hypertension and their potential linkage were explored in South Sulawesi Province, Indonesia. The Global Moran’s I and regression analysis were utilized to identify the characteristics involved. The methods were performed based on T2DM and hypertension data from 2017 and 2018 acquired from Social Health Insurance Administration in Indonesia. The spatial variation of T2DM and hypertension showed that the prevalence rate of T2DM and hypertension tends to occur randomly (p = 0.678, p = 0.711, respectively). By utilizing Generalized Poisson Regression Analysis, our study showed a significant relationship between T2DM and hypertension (p ≤ 0.001). This research could help policy makers to plan and support projects with the aim of overcoming the risk of T2DM and hypertension.

Data utilized. The data used in this study refer to health insurance participants who suffered from T2DM and hypertension for the period of January 2017 to December 2018 consisted of 496 people with T2DM and 2597 people with hypertension, which spread all over 24 districts and cities in the South Sulawesi Province. In this research the districts and the cities denote a spatial unit of analysis. The data was generated from Social Health Insurance Administration Body, a government organization established to provide health insurance program for Indonesian. The population data were obtained from the Central Bureau of Statistics of South Sulawesi Province, locally known as Badan Pusat Statistik (BPS) which is a non-departmental government institute in Indonesia responsible for conducting statistical surveys. The daily data of the health insurance participants were accumulated by month and the prevalence rate was calculated per 100,000 people.
The daily data for T2DM and hypertension used in this study were processed based on the prevalence rate data for the occurrences of the two diseases. The prevalence rate was calculated for every 100,000 population by the formula: (1) Prevalence rate (i) = The number of infected at district − i The number of Population at district − i × 100, 000  www.nature.com/scientificreports/ The prevalence rate was calculated annually and the prevalence rate for the 2017-2018 period is the average value for both 2017 and 2018 time periods in each district/city. In addition, we also calculated the minimum, maximum, mean and standard deviation of the prevalence rate in both diseases.
Statistical analysis. In this paper, we utilized Spatial Regression (SR) and Generalized Poisson Regression (GPR) models to examine the relationship between hypertension and T2DM. The models were fitted using the maximum likelihood method to assess the associations of the number of hypertension cases as the dependent variable and the number of TD2M cases as the independent variable. The inference procedure from the GPR models is performed using the coefficients of the T2DM variable and their standard errors. If the interval of estimated coefficients (estimated coefficients value − standard error; estimated coefficients value + standard error) contains zero value, then the independent variable insignificantly affect the dependent variable. Otherwise, the independent variable has a significant effect on the dependent variable.
The inference procedure was also conducted by calculating the probability value (p-value). If the p-value is less than the significance level (0.05) then the independent variable has a significant influence on the dependent variable. Otherwise, independent variables have a significant influence on a dependent variable. All the statistical analyses were performed using a 5% significance level and R-Studio version 1.2.5033 as a computation tool.
Natural break classification method. The natural break method is a measurable procedure to identify cluster of values that are characteristic in data distribution. The natural break method using an algorithm that reduces the variance within classes and maximizes variances between classes. The natural break algorithm results may be expressed in a map of colors in gradation. Spatial cluster analysis. Generally, to identify spatial clusters of disease, Global Moran's I statistics were used. Global Moran's I is a global spatial autocorrelation statistic that identified correlation between one variable at location and different variable at the neighboring locations. If Global Morans I statistics was significant, we used Local Morans I and Getis-Ord Gi* 6,7 to determine areas characterized as hotspots (concentrations) and coldspots (absence) with regard to T2DM and hypertension. This classifies the spatial patterns into clusters and outliers where the former can either be positive (hotspots) or negative (coldspots); while the latter are spatial objects whose attribute values are distinctly different from those of their spatial neighbors.
Regressions models. We measured the effect of T2DM on hypertension using the SR 6 and the GPR models 8 . In this study, we assessed the relationship between the number of T2DM and hypertension cases using the SR analysis as well as the GPR model, where T2DM is set as independent variables and hypertension is the dependent variable. The hypothetical tests are made to make a valid conclusion as to whether independent variables affect that dependent variable or not. The SR model is used to see if there is spatial contribution to hypertension. The GPR used to the dependent variable as count. It matched the number of hypertensive cases in the form of count. The spatial regression model is a regression model with spatial dependence through response variables (Spatial Lag Model/SLM) or in the components of random error (Spatial Error Model/SEM). In contrast. the classic regression model (Eq. 1) has no spatial dependency. The spatial lag model is stated in Eq. (2) and the spatial error model is stated in Eq. (3) as follows: where u = Wu + ε; Y the response variable; X the predictor variable; W the matrices of normalized weight spatial; β the coefficient of the predictor variable; ε a random error component; I the identity matrices; u a spatial random error; ρ the spatial effect of SLM; and λ the spatial effect of SEM. If ρ or λ is not significantly different from zero (p-value > 0.05) then there is no spatial dependency.
The generalized Poisson regression (GPR) model. Let Y i be a count response variable that followed the GPR distribution. The probability function of Y i denoted as where y i = 0, 1, 2, …; and µ i = exp(x i β), x i is a (p − 1) dimensional vector of covariates and β is a p-dimensional vector of coefficient of covariate or parameters, α is a parameter of GPR, p is a positive integer. Means and variance of GPR distribution are E(Y i |x i ) = µ i and Var(Y i |x i ) = µ i (1 + αµ i ) 2 respectively. If α = 0 then GPR model reduce to the Poisson regression model.
Inference procedures through regression models are conducted by estimating parameters values based on available observational data. The parameters estimation methods for both SR and GPR models used maximum likelihood method. After obtaining the regression model results from the estimate parameters, the next step is to evaluate the regression model using the analysis of variance (ANOVA) table, which provides information about levels of variability within a regression model and form a basis for tests of significance. ANOVA calculations are shown in the analysis of variance table. The ANOVA table contains the F test statistic for testing the hypothesis that β ≠ 0 against the null hypothesis that β = 0. The F test is defined as the ratio of the mean square model and the means square error. If the ratio is large than there is evidence against the null hypothesis.

Results
The comparison of the prevalence rate values of T2DM in each district/city in the 2017-2018 period is shown in Table 1 and visualized in 5 groups as shown in Fig. 2 as follows: Table 1 showed that the prevalence rate of T2DM for the 2017-2018 period was generally around 1-2 cases for every 100,000 population at a district/city in South-Sulawesi Province. The minimum, maximum, mean and standard deviation of prevalence rate of T2DM are 0.43, 17.36, 5.75 and 3.78 respectively ( Table 2). The number of cases equal to three or more T2DM cases occured in Bulukumba and Sidenreng Rappang district (3-4 cases).
The prevalence rate of hypertension for the 2017-2018 period in each district/city is shown in Fig. 3 as follows: In general, the prevalence rate of hypertension cases for the 2017-2018 period is 2-3 cases for every 100,000 population. Wajo, Luwu and Palopo city represents areas with a prevalence rate of 5 to 7 cases for every 100,000 population. The minimum, maximum, mean and standard deviation of prevalence rate of hypertension are 13.94, 51.73, 32.61 and 10.77, respectively ( Table 2).
The results of the spatial cluster analysis using Global Moran's I to see the spatial cluster in the study area showed a score = − 0.0670 (p = 0.678) for the prevalence rate of T2DM and a score = − 0.0633 (p = 0.711) for the prevalence of hypertension. This indicates that the prevalence of T2DM and hypertension cases tends to occur randomly. The estimation parameter on the spatial cluster analysis is shown in Table 3. Table 4 shows the results of the value of the estimated parameters in the Poisson regression model. The estimated value of the coefficient of T2DM was 0.005, the standard of error was 0.0005 and the p-value was < 0.001. However, the residual deviance was 1772.8 divided by 22 degrees of freedom earns more than one. It indicated an overdispersion of the models. For that matter, it was further used a GPR model approach to overcome the overdispersion problem above. The estimated parameter of the GPR is shown in the ANOVA Table 5.
The GPR parameter estimation produced a standard error value of 0.0007 and a p-value of < 0.001, which indicated a significant impact between the number of T2DM on the number of hypertension.

Discussion
While often have no obvious symptoms at first, T2DM and hypertension are two initial diseases on developing severe cardio-and cerebrovascular complications. Considering T2DM and hypertension are associated with single nucleotide polymorphisms (SNPs) genetic mutations 9 , the production of map to inform family or territorial clusters is important in effort to effectively detect T2DM and hypertension 10 .
By study conducted in Taiwan on 922 participants, 30 novel single nucleotide polymorphisms (SNPs) were associated with comorbid hypertension in T2DM patients adjusted for age and body mass index www.nature.com/scientificreports/ (p-value < 1 × 10 −4 ). A cumulative genetic risk score consisting of 14 of the 38 SNPs is important for hypertension and increased propensity for systolic blood pressure and may contribute to hypertension in T2DM in this country 10 . Another study conducted in Malaysia involving 320 volunteers classified based on hypertension (163) and normotensive (157) conditions showed that TT genotype/T allele of the WNK4 gene resulted in a close relationship between hypertension and T2DM 11 . While genetic study remained limited in Indonesia, it is been     www.nature.com/scientificreports/ documented that polymorphism of rs87148, especially CC genotype and C allele, and CAPN10 had a significant association with HbA1c level and increased T2DM vulnerability, respectively 12,13 .
In this study, we used sample data from patients who visited health care facilities which was registered by Social Security Administrator for Health (BPJS) in the 2017-2018 period. Our study showed that the prevalence of T2DM and hypertension in South Sulawesi Province was 1-2 and 2-3 patients for every 100,000 population, respectively. These values are slightly lower than the national average of 2% and 8.4% for T2DM and hypertension in the same period.
By using spatial analysis, we found that the distribution of T2DM and hypertension had same patterns. There was no correlation between one variable at one location and different variable at neighboring locations. The fact that both of the diseases are non-infectious disease may contribute to this result. Since the characteristics of the districts/cities in South Sulawesi Province are almost equal in terms of the human development index, community characteristics and health facilities, it is safe to propose that the distribution pattern of T2DM and hypertension is closely related to genetic and lifestyle. The observation of strong relationship on the gene-lifestyle interaction to develop T2DM and hypertension further corroborated the pivotal role of genetic and lifestyle factor on the risk of these diseases [14][15][16] .
Spatial regression analysis and classical regression found that the regression model of 70% and 71% could explain the variation of this finding. The results of the ANOVA table show that T2DM has a p-value of < 0.001. This indicates that T2DM has a significant linear relationship to increase hypertension. This is concurred with the research conducted by Akalu and Yitayeh (2020) on entire T2DM patients in the Ethiopian Debre Tabor Hospital that the prevalence of hypertension in T2DM patients was 59.5% 17 . Research conducted in Benghazi also showed that 85.6%, 54.2% and 56.3% prevalence of hypertension among DM patients 18 . In line with this finding, a study conducted by Tsimihodimos et al. in Mexico City for seven years showed that 16% to 46% of subjects were experiencing hypertensive; among participants the prevalence of T2DM is around 20-39% 19 . Moreover, half of the patients with T2DM also had hypertension in Japan 20 .
Several translational studies had explored the mechanism underlying the close relation between T2DM and hypertension. For instances, patients with T2DM have increased peripheral arterial resistance caused by vascular remodeling and increased body fluid volume associated with hyperinsulinemia and insulin resistanceinduced hyperglycemia 17 . In addition, a recent study has identified GLP1R (glucagon-like peptide-1 receptor) expression in the carotid body (CBs) of spontaneously hypertensive rat as a novel signaling circuit that mediate hyperglycemia-induced peripheral chemoreflex sensitization, sympathetic overactivity and eventually exacerbate hypertensive condition 21 .
This study needs further research since the data was sourced from secondary data which released by BPJS. It is based on patient visitation and needs more description to primary characteristics, hence patients who do not participate in the national health insurance scheme may not be recorded. Although this study had involved all cities/districts in South Sulawesi Province, it is also important to note that this province only consisted of a small number (24) of cities/districts. In this regard, further study that utilize individual patient-level data aggregate to the grid cells (for instance 1 km × 1 km) instead of cities/districts may generate more specific high-risk areas and stronger and more reliable statistical analysis. Given that T2DM and hypertension are asymptomatic in the early stages, some people do not visit health care facilities. This phenomenon indeed would not be documented in health insurance records.

Conclusions
A geospatial analysis of patients with T2DM and hypertension has been carried out in this study. There were two geospatial analyzes carried out, namely: spatial cluster analysis and spatial regression analysis. Globally, the number of T2DM and hypertension cases registered with BPJS Kesehatan tended to occur randomly. The results of the spatial regression analysis showed that the prevalence of T2DM can increase the number hypertension. This data may be used by policy makers to plan a comprehensive program to reduce the prevalence and risk of complications of these diseases.

Data availability
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.